Start by loading my libraries:

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.4     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(rstan)
## Loading required package: StanHeaders
## rstan (Version 2.21.2, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## 
## Attaching package: 'rstan'
## The following object is masked from 'package:tidyr':
## 
##     extract
library(bayesplot)
## This is bayesplot version 1.8.1
## - Online documentation and vignettes at mc-stan.org/bayesplot
## - bayesplot theme set to bayesplot::theme_default()
##    * Does _not_ affect other ggplot2 plots
##    * See ?bayesplot_theme_set for details on theme setting
library(bayesrules)

Question 6.1: Steps for grid approximation

  1. Identify the steps for grid approximation of a posterior model
  1. define a discrete grid of theta values
  2. evaluate the prior and likelihood at each theta value
  3. obtain a discrete approximation of the posterior by:
  1. finding the product of the prior and the likelihood at each theta and
  2. normalizing the products so they all sum to 1
  1. randomly select N samples wrt to their normalized posterior
  1. Which steps would you change to make the approximation more accurate? How would you change it?

You can change step 1 to define grids that have different amounts of segmentation. The more discrete steps you have, the ‘clearer’ the picture will be of the posterior.

Question 6.2: Trace plot diagnostics

For each MCMC simulation scenario, sketch what a single chain trace plot might look like for each simulation:

Different Mixing Scenarios

Question 6.3: MCMC Woes

For each simulation scenario below, describe how the scenario could impact the posterior approximation:

  1. The chain is mixing too slowly

This could mean that you don’t get sufficiently close to the true posterior in the amount of iterations you run so the posterior you get at the end of the iterations will be inaccurate

  1. The chain has high correlation

It produces strong trends in the posterior making the model behave less and like independent samples. This shows larger error in the posterior approximation.

  1. The chain has a tendency to get “stuck”

This will over sample values at a particular position of posterior values and produce erroneous spikes in the posterior approximation.

Question 6.4: MCMC simulation: thank you for being a friend

Your friend missed class this week and they are allergic to reading textbooks. You decide to answer their questions:

  1. Why is it important to look at MCMC diagnostics?

This will give you some ideas if there are potential flags that mean your posterior is not a good approximation of the true posterior.

  1. Why are MCMC simulations helpful?

When the theta values become too complex to feasibly get a posterior model, MCMC allows you to run iterations that allow you to approximate the posterior

  1. What are the benefits of using RStan?

RStan allows us to define a Bayesian model structure and simulate the posterior. It has the ability to run the MCMC algorithm to get an approximate sample based on different distributions specified in the RStan notation. This also produces different ways to visualize the posterior and assess for the accuracy of the model with different diagnostic techniques.

  1. What don’t you understand about the chapter?

After class I actually feel pretty good about the concepts in the chapter.

Question 6.5: Beta-binomial grid approximation

Consider the beta binomial model for pi with Y|pi ~ Bin(n,pi) and pi ~ Beta(3,8). Suppose that in n = 10 independent trials you observe Y = 2 successes. The calculated posterior is Beta(5,16)

  1. Utilize grid approximation with grid values pi e {0,0.25,0.5,0..75,1} to approximate the posterior model
#step 1: discrete pieces

grid_data <- data.frame(pi_grid = seq(from = 0, to = 1, length = 5))

#step 2: evaluate the prior and likelihood

grid_data <- grid_data |> 
  mutate(prior = dbeta(pi_grid, 3, 8),
       likelihood = dbinom(2,10,pi_grid))

#step 3: calculate the products of prior and likelihoods

grid_data <- grid_data |> 
  mutate(unnormalized = likelihood * prior,
         posterior = unnormalized / sum(unnormalized))

#confirm that the posterior sums to 1
grid_data |> 
  summarize(sum(unnormalized),sum(posterior))
##   sum(unnormalized) sum(posterior)
## 1         0.8765603              1
#step 4: draw a sample

set.seed(369)

post_sample <- sample_n(grid_data,size=10000,weight=posterior, replace=TRUE)

post_sample |> 
  tabyl(pi_grid) |> 
  adorn_totals("row")
##  pi_grid     n percent
##     0.25  9643  0.9643
##      0.5   357  0.0357
##    Total 10000  1.0000
ggplot(post_sample,aes(x=pi_grid)) +
  geom_histogram(aes(y=..density..),color="white") +
  stat_function(fun=dbeta,args=list(5,16)) +
  lims(x=c(0,1))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing missing values (geom_bar).

  1. Repeat part a using a grid of 201:
#step 1: discrete pieces

grid_data <- data.frame(pi_grid = seq(from = 0, to = 1, length = 201))

#step 2: evaluate the prior and likelihood

grid_data <- grid_data |> 
  mutate(prior = dbeta(pi_grid, 3, 8),
       likelihood = dbinom(2,10,pi_grid))

#step 3: calculate the products of prior and likelihoods

grid_data <- grid_data |> 
  mutate(unnormalized = likelihood * prior,
         posterior = unnormalized / sum(unnormalized))

#confirm that the posterior sums to 1
grid_data |> 
  summarize(sum(unnormalized),sum(posterior))
##   sum(unnormalized) sum(posterior)
## 1          41.79567              1
#step 4: draw a sample

set.seed(369)

post_sample <- sample_n(grid_data,size=10000,weight=posterior, replace=TRUE)

post_sample |> 
  tabyl(pi_grid) |> 
  adorn_totals("row")
##  pi_grid     n percent
##    0.015     2  0.0002
##     0.02     1  0.0001
##    0.025     3  0.0003
##     0.03     1  0.0001
##    0.035     4  0.0004
##     0.04     3  0.0003
##    0.045     6  0.0006
##     0.05    15  0.0015
##    0.055    20  0.0020
##     0.06    15  0.0015
##    0.065    25  0.0025
##     0.07    32  0.0032
##    0.075    32  0.0032
##     0.08    45  0.0045
##    0.085    55  0.0055
##     0.09    65  0.0065
##    0.095    54  0.0054
##      0.1    74  0.0074
##    0.105    92  0.0092
##     0.11    97  0.0097
##    0.115   115  0.0115
##     0.12   107  0.0107
##    0.125   140  0.0140
##     0.13   133  0.0133
##    0.135   144  0.0144
##     0.14   154  0.0154
##    0.145   164  0.0164
##     0.15   175  0.0175
##    0.155   200  0.0200
##     0.16   171  0.0171
##    0.165   193  0.0193
##     0.17   198  0.0198
##    0.175   197  0.0197
##     0.18   223  0.0223
##    0.185   194  0.0194
##     0.19   217  0.0217
##    0.195   216  0.0216
##      0.2   213  0.0213
##    0.205   238  0.0238
##     0.21   237  0.0237
##    0.215   249  0.0249
##     0.22   228  0.0228
##    0.225   243  0.0243
##     0.23   202  0.0202
##    0.235   223  0.0223
##     0.24   204  0.0204
##    0.245   197  0.0197
##     0.25   188  0.0188
##    0.255   197  0.0197
##     0.26   150  0.0150
##    0.265   188  0.0188
##     0.27   173  0.0173
##    0.275   180  0.0180
##     0.28   154  0.0154
##    0.285   181  0.0181
##     0.29   186  0.0186
##    0.295   149  0.0149
##      0.3   147  0.0147
##    0.305   137  0.0137
##     0.31   133  0.0133
##    0.315   156  0.0156
##     0.32   133  0.0133
##    0.325   126  0.0126
##     0.33    99  0.0099
##    0.335    96  0.0096
##     0.34    94  0.0094
##    0.345    98  0.0098
##     0.35    89  0.0089
##    0.355    82  0.0082
##     0.36    96  0.0096
##    0.365    89  0.0089
##     0.37    67  0.0067
##    0.375    63  0.0063
##     0.38    69  0.0069
##    0.385    64  0.0064
##     0.39    58  0.0058
##    0.395    36  0.0036
##      0.4    44  0.0044
##    0.405    48  0.0048
##     0.41    33  0.0033
##    0.415    27  0.0027
##     0.42    33  0.0033
##    0.425    26  0.0026
##     0.43    22  0.0022
##    0.435    29  0.0029
##     0.44    30  0.0030
##    0.445    18  0.0018
##     0.45    24  0.0024
##    0.455    23  0.0023
##     0.46    14  0.0014
##    0.465    14  0.0014
##     0.47    12  0.0012
##    0.475    16  0.0016
##     0.48     6  0.0006
##    0.485    15  0.0015
##     0.49     8  0.0008
##    0.495     7  0.0007
##      0.5     8  0.0008
##    0.505     2  0.0002
##     0.51     7  0.0007
##    0.515     7  0.0007
##     0.52     1  0.0001
##    0.525     3  0.0003
##     0.53     2  0.0002
##    0.535     5  0.0005
##     0.54     4  0.0004
##    0.545     3  0.0003
##     0.55     1  0.0001
##    0.555     2  0.0002
##     0.56     1  0.0001
##    0.565     1  0.0001
##     0.57     2  0.0002
##    0.575     1  0.0001
##     0.58     2  0.0002
##    0.585     1  0.0001
##     0.59     1  0.0001
##     0.61     2  0.0002
##    0.635     1  0.0001
##    Total 10000  1.0000
ggplot(post_sample,aes(x=pi_grid)) +
  geom_histogram(aes(y=..density..),color="white") +
  stat_function(fun=dbeta,args=list(5,16)) +
  lims(x=c(0,1))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing missing values (geom_bar).

Question 6.6: Gamma-Poisson grid approximation

Consider the gamma-poisson model for Yi | lambda ~ Pois(lambda) and lambda ~ Gamma(20,5). If you observe n = 3 independent data points (Y1,Y2,Y3) = (0,1,0)

  1. Use grid approx with grid values lambda e {0,1,2,…8} to approximate the posterior model of lambda
#step 1: from 0 to 8 equally spaced

grid_data <- data.frame(lambda_grid = seq(from=0,to=8,length=9))

#step 2: using the gamma vales 20,5 and the values for Y in each position for dpois

grid_data <- grid_data |> 
  mutate(prior=dgamma(lambda_grid,20,5),
         likelihood=dpois(0,lambda_grid) * dpois(1,lambda_grid) * dpois(0,lambda_grid))

#step 3: normalize the posterior 

grid_data <- grid_data |> 
  mutate(unnormalized=likelihood*prior,
         posterior=unnormalized/sum(unnormalized))

#step 4: draw a sample

set.seed(369)

post_sample <- sample_n(grid_data,size=10000,
                        weight=posterior,replace=TRUE)

Then we can see the graph of the posterior:

ggplot(post_sample,aes(x=lambda_grid)) +
  geom_histogram(aes(y=..density..),color="white") +
  stat_function(fun=dgamma,args=list(20,5))+
  lims(x=c(0,8))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing missing values (geom_bar).

  1. Repeat part a with a grid of 201 equally spaced values
grid_data <- data.frame(lambda_grid = seq(from=0,to=8,length=201))

grid_data <- grid_data |> 
  mutate(prior=dgamma(lambda_grid,20,5),
         likelihood=dpois(0,lambda_grid) * dpois(1,lambda_grid) * dpois(0,lambda_grid))

grid_data <- grid_data |> 
  mutate(unnormalized=likelihood*prior,
         posterior=unnormalized/sum(unnormalized))

set.seed(369)

post_sample <- sample_n(grid_data,size=10000,
                        weight=posterior,replace=TRUE)

ggplot(post_sample,aes(x=lambda_grid)) +
  geom_histogram(aes(y=..density..),color="white") +
  stat_function(fun=dgamma,args=list(20,5))+
  lims(x=c(0,8))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing missing values (geom_bar).

Question 6.7: Normal-Normal grid approximation

Consider the normal-normal model for mu with Yi | mu ~N(mu,1.3^2) and mu ~N(10,1.2^2). Suppose that on n = 4 independent observations you observe: (Y1,Y2,Y3,Y4) = (7.1,8.9,8.4,8.6)

  1. Use grid approximation with grid values mu e {5,6,7,…15} to approximate the posterior model of mu
#step 1: define a grid of values

grid_data <- data.frame(mu_grid = seq(from = 5, to = 15, length = 11))

#step 2: evaluate the prior and likelihoods using the prior values 10 and 1.2 and the observation model with sd = 1.3

grid_data <- grid_data |> 
  mutate(prior = dnorm(mu_grid,mean=10,sd=1.2),
        likelihood = dnorm(7.1,mean=mu_grid,sd=1.3)*
          dnorm(8.9,mean=mu_grid,sd=1.3)*
          dnorm(8.4,mean=mu_grid,sd=1.3)*
          dnorm(8.6,mean=mu_grid,sd=1.3))

#step 3: approximate the posterior

grid_data <- grid_data |> 
  mutate(unnormalized = likelihood * prior,
         posterior = unnormalized / sum(unnormalized))

#plot it out to see what's what

ggplot(grid_data,aes(x=mu_grid,y=posterior)) +
  geom_point() +
  geom_segment(aes(x=mu_grid,xend=mu_grid,y=0,yend=posterior))

This is showing that the highest density is around 8 and 9

  1. Repeat part a using a grid of 201 values
#step 1: define a grid of values

grid_data <- data.frame(mu_grid = seq(from = 5, to = 15, length = 201))

#step 2: evaluate the prior and likelihoods using the prior values 10 and 1.2 and the observation model with sd = 1.3

grid_data <- grid_data |> 
  mutate(prior = dnorm(mu_grid,mean=10,sd=1.2),
        likelihood = dnorm(7.1,mean=mu_grid,sd=1.3)*
          dnorm(8.9,mean=mu_grid,sd=1.3)*
          dnorm(8.4,mean=mu_grid,sd=1.3)*
          dnorm(8.6,mean=mu_grid,sd=1.3))

#step 3: approximate the posterior

grid_data <- grid_data |> 
  mutate(unnormalized = likelihood * prior,
         posterior = unnormalized / sum(unnormalized))

#plot it out to see what's what

ggplot(grid_data,aes(x=mu_grid,y=posterior)) +
  geom_point() +
  geom_segment(aes(x=mu_grid,xend=mu_grid,y=0,yend=posterior))

With more intervals we can see that the graph has higher density between 8 and 9.

Question 6.8: The Curse of Dimenstionality

  1. Describe a situation in which we would want to have inference for multiple parameters

This is like the Michelle polling example, when you want to get a posterior that involves multiple nested parameters (e.g., voting patterns in multiple states.) Another example would be if you wanted to look at win probabilities across a football league looking at multiple teams with many factors influencing wins for each individual team.

  1. Explain how dimensionality can affect grid approximation and why this is a curse

Grid approximation is dependent on the size of the grid. The finer the grid, the better the approximation. However, if you have two dimensions as opposed to one, the amount of grids values you have to check increases to the power of 2. If you have three dimensions it increases to the power of 3. This quickly gets out of control.

Question 6.9: Comparing MCMC to grid approximation

  1. What drawbacks do MCMC and grid approximation share?

Both MCMC and grid approximation rely on guess and check which can get more complicated the more complex the model and has room for error in the size of the grid established or the amount of iterations.

  1. What advantages do MCMC and grid approximation share?

They both allow you to approximate posteriors from priors with more complex parameters that can’t be analytically derived. They both produce a sample of N theta values.

  1. What is an advantage of grid approximation over MCMC?

Grid approximation takes an independent sample of theta from a discrete approximation of the posterior.

  1. What is an advantage of MCMC over grid approximation?

MCMC has more flexibility for more complicated models and runs a number of chains to increase the the power of approximations.

Question 6.10: Is it a Markov Chain?

Below are examples of chains for different probability parameters theta. For each example, determine whether the given chain is a Markov chain. Explain.

  1. You go out to eat N nights in a row and theta_i is the probability you go to a Thai restaurant on day i.

This is an MCMC chain because it takes data from actions days past (going to eat Thai) and tries to decide if you will make the same action based on the past day’s action.

  1. You play the lottery N days in a row and theta_i is the probability you win the lottery on day i

This is not an MCMC chain. The action you can use data on is the probability of playing the lottery from the previous day. The probability that you win isn’t assess based on previous playing experience.

  1. You play your roommate in chess for N games in a row and theta_i is the probability you win game i against your roommate

This is not an MCMC chain. You have no previous data on the outcomes of the games, the action you have is that you played.

Question 6.11: MCMC with RStan: Step 1

Use the given info to define the Bayesian model structure using the correct RStan syntax. I’ll take observations as 10 which matters for the data.

  1. Y|pi ~ Bin(20,pi) with pi ~ Beta(1,1)
bin_model = '
data {
int<lower = 0, upper = **number of observations**> Y;
}
parameters {
real<lower = 0, upper = 1>pi;
}
model {
Y ~ binomial(20,pi);
pi ~ beta(1,1);
}
'
  1. Y|lambda ~ Pois(lambda) with lambda ~ Gamma(4,2)
gamma_model ='
data {
int<lower = 0> Y[**number of events**];
}
parameters {
real<lower = 0>lambda;
}
model {
Y ~ poisson(lambda);
lambda ~ gamma(4,2);
}
'
  1. Y|mu ~ N(mu,1^2) with mu ~ N(0,10^2)
normal_model = '
data {
    vector[**this is the size of the data vector**] Y;
}
parameters {
    real mu;
}
model {
    Y ~ normal(mu, 1.3);
    mu ~ normal(10,1.2);
}
'

Question 6.12: MCMC with RStan: Steps 1 and 2

Use the given info to (1) define the Bayesian model structure, and (2) simulate the posterior using correct RStan syntax:

  1. Y|pi ~ Bin(20,pi) with pi ~ Beta(1,1) with Y = 12
#define the model

bb_model <- "
data {
int<lower = 0, upper = **number of observations**> Y;
}
parameters {
real<lower = 0, upper = 1>pi;
}
model {
Y ~ binomial(20,pi);
pi ~ beta(1,1);
}
"

#simulate the posterior

bb_sim <- stan(model_code = bb_model,data=list(Y=12),chains=4,iter=5000*2,seed=369)
  1. Y|lambda ~ Pois(lambda) with lambda ~ Gamma(4,2) with Y = 3
#define the model

gp_model = "
data {
int<lower = 0> Y[**number of events**];
}
parameters {
real<lower = 0>lambda;
}
model {
Y ~ poisson(lambda);
lambda ~ gamma(4,2);
}
"

gp_sim <- stan(model_code = gp_model, data=list(Y=c(3,**number of trials**)),
               chains=4,iter=5000*2,seed=369)
  1. Y|mu ~ N(mu,1^2) with mu ~ N(0,10^2)
#define the model

normal_model = '
data {
    vector[**this is the size of the data vector**] Y;
}
parameters {
    real mu;
}
model {
    Y ~ normal(mu, 1.3);
    mu ~ normal(10,1.2);
}
'

#simulate the posterior

d <- list(Y = c(7.1,8.9,8.4,8.6))
nm_sim <- stan(model_code = normal_model, data = d, chains = **number**, iter = **number * 2 **, seed = 369)

Question 6.13: MCMC with RStan: Beta-Binomial

Consider the beta-binomial model for pi with Y|pi ~ Bin(n,pi) and pi ~ Beta(3,8). Suppose that in n = 10 independent trials, you observe Y = 2 successes.

  1. Simulate the posterior model of pi with RStan using 3 chains and 12000 iterations per chain
#step 1: define the model

bb_model <- "
data {
int<lower = 0, upper = 10> Y;
}
parameters {
real<lower = 0, upper = 1>pi;
}
model {
Y ~ binomial(10,pi);
pi ~ beta(3,8);
}
"

#Step 2: simulate the posterior
bb_sim <- stan(model_code = bb_model,data=list(Y=2),chains=3,iter=6000*2,seed=369)
  1. Produce trace plots for each of the three chains
mcmc_trace(bb_sim,pars="pi",size=0.1)

  1. What is the range of values on trace plot x-axis? Why is the maximum values of this range not 12000?

The range of values is 6000. This is because the model “burns” the first half of the values and starts to use the second half where the behavior of the chain is theoretically more stable.

  1. Create a density plot of the values for each of the three chains
#density plot of markov chain values
mcmc_dens(bb_sim,pars="pi") +
  yaxis_text(TRUE) +
  ylab("density")

  1. Specify the posterior model of pi. How does the MCMC compare?

Taking the formula for updating the Beta with data values, the posterior we would get from a prior of Beta(3,8) and Y = 2, n = 10 would be a posterior of Beta(5,16). This would produce a mean around 5 / 5 + 16 = 0.25 and a mode of around 5 - 1 / 5 + 16 - 1 = 0.2. This looks pretty close to the visual representation we get above!

Question 6.14: MCMC with RStan: once more with feelind

Repeat the above exercise for a Beta-Binomial model with Y|pi ~ Bin(n,pi) and pi ~ Beta(4,3) where Y = 4 successes in n = 12 independent trials:

#step 1: define the model

bb_model <- "
data {
int<lower = 0, upper = 12> Y;
}
parameters {
real<lower = 0, upper = 1>pi;
}
model {
Y ~ binomial(12,pi);
pi ~ beta(4,3);
}
"

#Step 2: simulate the posterior
bb_sim <- stan(model_code = bb_model,data=list(Y=4),chains=3,iter=6000*2,seed=369)
  1. Trace plots
mcmc_trace(bb_sim,pars="pi",size=0.1)

  1. Range is again 6000 because we burn the first half of our 12000 iterations

  2. Density plot

#density plot of markov chain values
mcmc_dens(bb_sim,pars="pi") +
  yaxis_text(TRUE) +
  ylab("density")

  1. Compare to the posterior

From prior Beta(4,3), Y = 4, n = 12 –> posterior Beta(8,11)

Mean = 8 / 8+11 = 0.421

Mode = 8 - 1 / 8 + 11 - 1 = 0.389

This looks like it stacks up!

Question 6.15: MCMC with RStan Gamma-Poisson

Consider the Gamma-Poisson model with Yi | lambda ~ Pois(lambda) and lambda ~ Gamma(20,5) and you observe n = 3 independent data points (Y1,Y2,Y3) = (0,1,0).

  1. Simulate the posterior model of lambda with RStan using 4 chains and 10000 iterations per chain
#step 1: define the model

gp_model <- "
data {
int<lower = 0> Y[3];
}
parameters {
real<lower = 0>lambda;
}
model {
Y ~ poisson(lambda);
lambda ~ gamma(20,5);
}
"

#step 2: simulate the posterior

gp_sim <- stan(model_code = gp_model, data=list(Y=c(0,1,0)),
               chains=4,iter=5000*2,seed=369)
  1. Produce trace and density plots for all four chains
#trace plots of the 4 markov chains
mcmc_trace(gp_sim,pars="lambda",size=0.1)

#density plot of the markov chain values
mcmc_dens(gp_sim,pars="lambda") +
  yaxis_text(TRUE) +
  ylab("density")

  1. From the density plots, what seems to be the most posterior plausible value of lambda?

I would guess around 2.7

  1. Specify the posterior model of lambda. How does it compare to the MCMC approximation?

From chapter 5, I’m pulling the formula that the posterior gamma is: Gamma (s + sum(yi), r + n). The mean is: \(\frac{s}{r}\) and the mode is \(\frac{s - 1}{r} \text{ for s > 1}\)

The prior for this model had s = 20 and r = 5 and the sum of yi is 1 with n = 3.

s_posterior = s + sum(yi) = 20 + 1 = 21 r_posterior = r + n = 5 + 3 = 8

Posterior Gamma(21,8)

Mean = 21 / 8 = 2.635

Mode = s -1 / r = 20 - 1 / 8 = 2.375

These values feel consistent with the density plot we produced with MCMC.

Question 6.16: MCMC with RStan: another Gamma-Poisson

Do 6.15 again with lambda ~ Gamma(5,5) prior model

#step 1: define the model

gp_model <- "
data {
int<lower = 0> Y[3];
}
parameters {
real<lower = 0>lambda;
}
model {
Y ~ poisson(lambda);
lambda ~ gamma(5,5);
}
"

#step 2: simulate the posterior

gp_sim <- stan(model_code = gp_model, data=list(Y=c(0,1,0)),
               chains=4,iter=5000*2,seed=369)
#trace plots of the 4 markov chains
mcmc_trace(gp_sim,pars="lambda",size=0.1)

#density plot of the markov chain values
mcmc_dens(gp_sim,pars="lambda") +
  yaxis_text(TRUE) +
  ylab("density")

  1. The plausible value of lambda is likely 0.6

  2. Specify the posterior model and compare:

Gamma (s + sum(yi), r + n). The mean is: \(\frac{s}{r}\) and the mode is \(\frac{s - 1}{r} \text{ for s > 1}\)

The prior for this model had s = 5 and r = 5 and the sum of yi is 1 with n = 3.

s_posterior = 5 + 1 = 6 r_posterior = 5 + 3 = 8

Posterior Gamma(6,8)

Mean = 6/8 = 0.75

Mode = 6 - 1 / 8 = 0.65

This feels consistent with the graph produced by MCMC.

Question 6.17: MCMC with RStan: Normal-Normal

Repeat 6.15 for the Normal-Normal with Yi|mu ~ N(mu,1.3^2) and mu ~ (10,1.2^2). Suppose that on n = 4 observations, you observe: (Y1,Y2,Y3,Y4) = (7.1,8.9,8.4,8.6)

  1. Simulate the posterior model using 4 chains and 10000 iterations
#define the model

normal_model = '
data {
    vector[4] Y;
}
parameters {
    real mu;
}
model {
    Y ~ normal(mu, 1.3);
    mu ~ normal(10,1.2);
}
'

#simulate the posterior

d <- list(Y = c(7.1,8.9,8.4,8.6))
nm_sim <- stan(model_code = normal_model, data = d, chains = 4, iter=5000*2,seed=369 )
  1. Produce trace and density plots for all chains
#trace plots of the 4 markov chains
mcmc_trace(nm_sim,pars="mu",size=0.1)

#density plot of the markov chain values
mcmc_dens(nm_sim,pars="mu") +
  yaxis_text(TRUE) +
  ylab("density")

  1. What seems to be the posterior value for mu?

This appears to be a bit under 9

  1. How does the MCMC approximation compare to the actual posterior?

Taking from chapter 5, the posterior values of a normal distribution is: \(N(\theta\frac{\sigma^2}{n\tau^2 + \sigma^2} + y\frac{n\tau^2}{n\tau^2 + \sigma^2},\frac{\tau^2\sigma^2}{n\tau^2 + \sigma^2})\)

The mean and mode are mu

Prior and data values of interest are:

sigma^2_prior = 1.3^2 theta_prior = 10 tau^2_prior = 1.2^2

n = 4 y_mean = 7.1 + 8.9 + 8.4 + 8.6 / 4 = 8.25

mu_posterior = theta_prior * (sigma^2 / ntau^2 + sigma^2) + y_mean (ntau^2 / ntau^2 + sigma^2)

sig_posterior = (tau^2 * sigma^2) / (n*tau^2 + sigma^2)

sig_pr <- 1.3^2
theta <- 10
tau_pr <- 1.2^2
n <- 4
y_mean <- 8.25


mu_post <-  (theta * (sig_pr / (n*tau_pr + sig_pr))) + (y_mean * (n*tau_pr / (n*tau_pr + sig_pr)))

sig_post = (tau_pr * sig_pr) / ((n*tau_pr) + sig_pr)

mu_post
## [1] 8.64698
sig_post
## [1] 0.3266577

This is close to my guess of around 9 for mu.

Question 6.18: MCMC with RStan: Normal-Normal once again

Repeat 6.17 with Yi|mu ~ N(mu, 8^2) and mu ~ N(-14,2^2). Suppose that on n = 5 independent observations you observe data: (Y1,Y2,Y3,Y4,Y5) = (-10.1,5.5,0.1,-1.4,11.5)

  1. Simulate the posterior model with 4 chains and 10000 iterations per chain
#define the model

normal_model_2 = '
data {
    vector[5] Y;
}
parameters {
    real mu;
}
model {
    Y ~ normal(mu, 8);
    mu ~ normal(-14,2);
}
'

#simulate the posterior

d <- list(Y = c(-10.1,5.5,0.1,-1.4,11.5))
nm_sim_2 <- stan(model_code = normal_model_2, data = d, chains = 4, iter=5000*2,seed=369 )
## 
## SAMPLING FOR MODEL '1f1e8ae7e4a8d2fbd986043f4b98d2de' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 9e-06 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.016346 seconds (Warm-up)
## Chain 1:                0.016642 seconds (Sampling)
## Chain 1:                0.032988 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '1f1e8ae7e4a8d2fbd986043f4b98d2de' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 2e-06 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.02 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.016504 seconds (Warm-up)
## Chain 2:                0.017021 seconds (Sampling)
## Chain 2:                0.033525 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL '1f1e8ae7e4a8d2fbd986043f4b98d2de' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 2e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.02 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.016529 seconds (Warm-up)
## Chain 3:                0.018045 seconds (Sampling)
## Chain 3:                0.034574 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL '1f1e8ae7e4a8d2fbd986043f4b98d2de' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 1e-06 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.016261 seconds (Warm-up)
## Chain 4:                0.016838 seconds (Sampling)
## Chain 4:                0.033099 seconds (Total)
## Chain 4:
  1. Produce trace and density plots
#trace plots of the 4 markov chains
mcmc_trace(nm_sim_2,pars="mu",size=0.1)

#density plot of the markov chain values
mcmc_dens(nm_sim_2,pars="mu") +
  yaxis_text(TRUE) +
  ylab("density")

  1. What seems to be the most posterior plausible value for mu?

This shows mu is a little below -10, let’s say -10.5

  1. How does the MCMC compare to the actual posterior approximation?

Using the same approach as the above.

The y_mean = -10.1 + 5.5 + 0.1 + -1.4 + 11.5 / 5 = 1.12 n = 5

Yi|mu ~ N(mu, 8^2) and mu ~ N(-14,2^2)

sig_pr <- 8^2
theta <- -14
tau_pr <- 2^2
n <- 5
y_mean <- 1.12


mu_post <-  (theta * (sig_pr / (n*tau_pr + sig_pr))) + (y_mean * (n*tau_pr / (n*tau_pr + sig_pr)))

sig_post = (tau_pr * sig_pr) / ((n*tau_pr) + sig_pr)

mu_post
## [1] -10.4
sig_post
## [1] 3.047619

This is really spot on for the mu!! Wow, I’m so good at reading graphs!!